A Long-Text Classification Method of Chinese News Based on BERT and CNN
نویسندگان
چکیده
Text Classification is an important research area in natural language processing (NLP) that has received a considerable amount of scholarly attention recent years. However, real Chinese online news characterized by long text, large information and complex structure, which also reduces the accuracy text classification as result. To improve news, we propose BERT-based local feature convolutional network (LFCN) model including four novel modules. First, to address limitation Bidirectional Encoder Representations from Transformers (BERT) on length max input sequence, named Dynamic LEAD-n (DLn) method extract short texts within based traditional LEAD digest algorithm. In Text-Text (TTE) module, use BERT pretrained complete sentence-level vector representation capture global features using mechanism identify correlated words text. After that, CNN-based convolution (LFC) module such key phrases. Finally, vectors generated different operations over several periods are fused used predict category Experimental results show new further improves news.
منابع مشابه
A New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملon the practicality and effectiveness of a personalized eclectic method incorporated into iranian high school efl syllabus
همگام با سرعت در حال رشد خلاقیت و نوآوری های آموزش زبان به ویژه ظهور روش ارتباطی آموزش زبان? بسیاری از مدارس زبان با بازاندیشی آموزش و پرورش خود? برای گنجاندن فعالیت های ارتباطی، وزمینه ی شخصی سازی شده به شیوه های سنتی خود به روز رسانی شده اند. با این حال، مدارس ایرانی در این زمینه آهسته پیش رفته اند. از این رو، هدف عمده ی پژوهش حاضر برداشتن یک گام در پر کردن شکاف بین نظریه های آموزشی نو ظهور و...
15 صفحه اولon the relationship between actual vs. perceived difficulties of a text and iranian efl learners reading anxiety
abstract this study examines the relationship between reading anxiety and difficulty of texts as well as the relationship between reading anxiety and students perceived difficulty of the texts. since difficulty is a relative concept, i limited its definition by sticking to the readability formula. we also took students perceived difficulty levels into account. therefore, in the present study, ...
15 صفحه اولthe innovation of a statistical model to estimate dependable rainfall (dr) and develop it for determination and classification of drought and wet years of iran
آب حاصل از بارش منبع تأمین نیازهای بی شمار جانداران به ویژه انسان است و هرگونه کاهش در کم و کیف آن مستقیماً حیات موجودات زنده را تحت تأثیر منفی قرار می دهد. نوسان سال به سال بارش از ویژگی های اساسی و بسیار مهم بارش های سالانه ایران محسوب می شود که آثار زیان بار آن در تمام عرصه های اقتصادی، اجتماعی و حتی سیاسی- امنیتی به نحوی منعکس می شود. چون میزان آب ناشی از بارش یکی از مولفه های اصلی برنامه ...
15 صفحه اولChinese Short Text Classification Based on Domain Knowledge
People are generating more and more short texts. There is an urgent demand to classify short texts into different domains. Due to the shortness and sparseness of short texts, conventional methods based on Vector Space Model (VSM) have limitations. To tackle the data scarcity problem, we propose a new model to directly measure the correlation between a short text instance and a domain instead of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2022
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2022.3162614